Learning to Extract Attribute Values from a Search Engine with Few Examples

نویسندگان

  • Xingxing Zhang
  • Tao Ge
  • Zhifang Sui
چکیده

We propose an attribute value extraction method based on analysing snippets from a search engine. First, a pattern based detector is applied to locate the candidate attribute values in snippets. Then a classifier is used to predict whether a candidate value is correct. To train such a classifier, only very few annotated triples are needed, and sufficient training data can be generated automatically by matching these triples back to snippets and titles. Finally, as a correct value may appear in multiple snippets, to exploit such redundant information, all the individual predictions are assembled together by voting. Experiments on both Chinese and English corpora in the celebrity domain demonstrate the effectiveness of our method: with only 15 annotated triples, 7 of 12 attributes’ precisions are over 85%; Compared to a state-of-the-art method, 11 of 12 attributes have improvements.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning cross-level certain and possible rules by rough sets

Machine learning can extract desired knowledge and ease the development bottleneck in building expert systems. Among the proposed approaches, deriving rules from training examples is the most common. Given a set of examples, a learning program tries to induce rules that describe each class. Recently, the rough-set theory has been widely used in dealing with data classification problems. Most of...

متن کامل

A Machine Learning Approach towards Improving Internet Search with a Question-Answering System

This paper introduces a prototype to extract common sense knowledge from the World Wide Web. The prototype combines a search engine with an automated database. It works by extracting information from the enormous amount of documents available on the World Wide Web. Two common examples are that men love women and that women love men (bi-directional relationship) or that boys like toys (unidirect...

متن کامل

Lightly-Supervised Attribute Extraction

Web search engines can greatly benefit from knowledge about attributes of entities present in search queries. In this paper, we introduce lightly-supervised methods for extracting entity attributes from natural language text. Using these methods, we are able to extract large numbers of attributes of different entities at fairly high precision from a large natural language corpus. We compare our...

متن کامل

DEXTER: Large-Scale Discovery and Extraction of Product Specifications on the Web

The web is a rich resource of structured data. There has been an increasing interest in using web structured data for many applications such as data integration, web search and question answering. In this paper, we present DEXTER, a system to find product sites on the web, and detect and extract product specifications from them. Since product specifications exist in multiple product sites, our ...

متن کامل

Mining from incomplete quantitative data by fuzzy rough sets

Machine learning can extract desired knowledge from existing training examples and ease the development bottleneck in building expert systems. Most learning approaches derive rules from complete data sets. If some attribute values are unknown in a data set, it is called incomplete. Learning from incomplete data sets is usually more difficult than learning from complete data sets. In the past, t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013